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RAW SEQUENCE LISTING" ^ DATE: 07/30/2001 

PATENT APPLICATION: US/0 9/7.65 , 272 TIME: 12:02:09 

Input Set : N: \Crf 3\RT5kE 60 \ 097 65272 . txt 
Output Set: N:\CRF3\07302001\l765272.raw 

SEQUENCE LISTING 

1 (1) GENERAL INFORMATION: 

(i) APPLICANT: Choi et . al . 
(ii) TITLE OF INVENTION: Streptococcus pneumoniae Antigens and Vaccines 
(iii) NUMBER OF SEQUENCES: 4 52 
(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Human Genome Sciences, Inc^ 

(B) STREET: 9410 Key West Avenue 

(C) CITY: Rockville 

(D) STATE: Maryland 

(E) COUNTRY: USA 

(F) ZIP: 20850 
(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette, 3.50 inch, 1.4Mb storage 

(B) COMPUTER: HP Vectra 486/33 

(C) OPERATING SYSTEM: MSDOS version 6.2 

(D) SOFTWARE: ASCII Text 
<vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US/09/765 , 272 

(B) FILING DATE: 22-Jan-2001 

(C) CLASSIFICATION: 
(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/961,083 

(B) FILING DATE: 
(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Brookes, A. Anders 

(B) REGISTRATION NUMBER: 36,373 

(C) REFERENCE/DOCKET NUMBER: PB340P2 
(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (301) 309-8504 

(B) TELEFAX: (301) 309-8512 
(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1999 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

97 TAAAATCTAC GACAATAAAA ATCAACTCAT TGCTGACTTG GGTTCTGAAC GCCGCGTCAA 60 
99 TGCCCAAGCT AATGATATTC CCACAGATTT GGTTAAGGCA ATCGTTTCTA TCGAAGACCA 120 
101 TCGCTTCTTC GACCACAGGG GGATTGATAC CATCCGTATC CTGGGAGCTT TCTTGCGCAA 180 
103 TCTGCAAAGC AATTCCCTCC AAGGTGGATC AACTCTCACC CAACAGTTGA TTAAGTTGAC 24 0 

105 TTACTTTTCA ACTTCGACTT CCGACCAGAC TATTTCTCGT AAGGCTCAGG AAGCTTGGTT 300 
107 AGCGATTCAG TTAGAACAAA AAGCAACCAA GCAAGAAATC TTGACCTACT ATATAAATAA 360 
109 GGTCTACATG TCTAATGGGA ACTATGGAAT GCAGACAGCA GCTCAAAACT ACTATGGTAA 4 20 

111 AGACCTCAAT AATTTAAGTT TACCTCAGTT AGCCTTGCTG GCTGGAATGC CTCAGGCACC 4 80 

113 AAACCAATAT GACCCCTATT CACATCCAGA AGCAGCCCAA GACCGCCGAA ACTTGGTCTT 54 0 
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115 ATCTGAAATG AAAAATCAAG GCTACATCTC TGCTGAACAG TATGAGAAAG CAGTCAATAC 600 

117 ACCAATTACT GAT GG ACT AC AAAGTCTCAA ATCAGCAAGT AATTACCCTG CTTACATGGA 660 

119 TAATTACCTC AAGGAAGTCA TCAATCAAGT TGAAGAAGAA ACAGGCTATA ACCTACTCAC 720 

121 AACTGGGATG GATGTCTACA CAAATGTAGA CCAAGAAGCT CAAAAACATC TGTGGGATAT 780 

123 TTACAATACA GACGAATACG TTGCCTATCC AGACGATGAA TTGCAAGTCG CTTCTACCAT 84 0 

125 TGTTGATGTT TCTAACGGTA AAGTCATTGC CCAGCTAGGA GCACGCCATC AGTCAAGTAA 900 

127 TGTTTCCTTC GGAATTAACC AAGCAGTAGA AACAAACCGC GACTGGGGAT CAACTATGAA 960 

12 9 ACCGATCACA GACTATGCTC CTGCCTTGGA GTACGGTGTC TACGATTCAA CTGCTACTAT 1020 

131 CGTTCACGAT GAGCCCTATA ACTACCCTGG GACAAATACT CCTGTTTATA ACTGGGATAG 108 0 

133 GGGCTACTTT GGCAACATCA CCTTGCAATA CGCCCTGCAA CAATCGCGAA ACGTCCCAGC 114 0 

135 CGTGGAAACT CTAAACAAGG TCGGACTCAA CCGCGCCAAG ACTTTCCTAA ATGGTCTAGG 1200 

137 AATCGACTAC CCAAGTATTC ACTACTCAAA TGCCATTTCA AGTAACACAA CCGAATCAGA 12 60 

139 CAAAAAATAT GGAGCAAGTA GTGAAAAGAT GGCTGCTGCT ■ TACGCTGCCT TTGCAAATGG 1320 

141 TGGAACTTAC TATAAACCAA TGTATATCCA TAAAGTCGTC TTTAGTGATG GGAGTGAAAA 1380 

14 3 AGAGTTCTCT AATGTCGGAA CTCGTGCCAT GAAGGAAACG ACAGCCTATA TGATGACCGA 14 4 0 

14 5 CATGATGAAA ACAGTCTTGA CTTATGGAAC TGGACGAAAT GCCTATCTTG CTTGGCTCCC 1500 

14 7 TCAGGCTGGT AAAACAGGAA CCTCTAACTA TACAGACGAG GAAATTGAAA ACCACATCAA 1560 

14 9 GACCTCTCAA TTTGTAGCAC CTGATGAACT ATTTGCTGGC TATACGCGTA AATATTCAAT 1620 

151 GGCTGTATGG ACAGGCTATT CTAACCGTCT GACACCACTT GTAGGCAATG GCCTTACGGT 1680 

153 CGCTGCCAAA GTTTACCGCT CTATGATGAC CTACCTGTCT GAAGGAAGCA ATCCAGAAGA 174 0 

155 TTGGAATATA CCAGAGGGGC TCTACAGAAA TGGAGAATTC GTATTTAAAA ATGGTGCTCG 1800 

157 TTCTACGTGG AACTCACCTG CTCCACAACA ACCCCCATCA ACTGAAAGTT CAAGCTCATC 18 60 

159 AT C AG AT AG T TCAACTTCAC AGTCTAGCTC AACCACTCCA AGCACAAATA ATAGTACGAC 1920 

161 TACCAATCCT AACAATAATA CGCAACAATC AAATACAACC CCTGATCAAC AAAATCAGAA 1980 

163- TCCTCAACCA GCACAACCA 1999 
165 (2) INFORMATION FOR SEQ ID NO: 2: 

167 (i) SEQUENCE CHARACTERISTICS: 

168 (A) LENGTH: 666 amino acids 

169 (B) TYPE: amino acid 

170 (C) STRANDEDNESS: single 

171 (D) TOPOLOGY: linear 
173 (ii) MOLECULE TYPE: protein 

178 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

180 Lys lie Tyr Asp Asn Lys Asn Gin Leu lie Ala Asp Leu Gly Ser Glu 

181 1 5 10 15 

183 Arg Arg Val Asn Ala Gin Ala Asn Asp lie Pro Thr Asp Leu Val Lys 

184 20 25 30 

186 Ala lie Val Ser He Glu Asp His Arg Phe Phe Asp His Arg Gly He 

187 35 40 45 

189 Asp Thr He Arg He Leu Gly Ala Phe Leu Arg Asn Leu Gin Ser Asn 

190 50 55 60 

192 Ser Leu Gin Gly Gly Ser Thr Leu Thr Gin Gin Leu He Lys Leu Thr 

193 65 70 75 80 

195 Tyr Phe Ser Thr Ser Thr Ser Asp Gin Thr He Ser Arg Lys Ala Gin 

196 85 90 95 

198 Glu Ala Trp Leu Ala He Gin Leu Glu Gin Lys Ala Thr Lys Gin Glu 

199 100 105 . HO 

201 He Leu Thr Tyr Tyr He Asn Lys Val Tyr Met Ser Asn Gly Asn Tyr 

202 ' 115 120 125 
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277 515 520 525 

27 9 Glu Leu Phe Ala Gly Tyr Thr Arg Lys Tyr Ser Met Ala Val Trp Thr 

280 530 535 540 

282 Gly Tyr Ser Asn Arg Leu Thr Pro Leu Val Gly Asn Gly Leu Thr Val 

283 545 550 555 560 

285 Ala Ala Lys Val . Tyr Arg Ser Met Met Thr Tyr Leu Ser Glu Gly Ser 

286 565 570 575 

288 Asn Pro Glu Asp Trp Asn lie Pro Glu Gly Leu Tyr Arg Asn Gly Glu 

289 580 585 590 

2 91 Phe Val Phe Lys Asn Gly Ala Arg Ser Thr Trp Asn Ser Pro Ala Pro 

292 595 600 605 

294 Gin Gin Pro Pro Ser Thr Glu Ser Ser Ser Ser Ser Ser Asp Ser Ser 

295 610 615 620 

297 Thr Ser Gin Ser Ser Ser Thr Thr Pro Ser Thr Asn Asn Ser Thr Thr 

298 625 630 635 640 

300 Thr Asn Pro Asn Asn Asn Thr Gin Gin Ser Asn Thr Thr Pro Asp Gin 

301 645 650 655 

303 Gin Asn Gin Asn Pro Gin Pro Ala Gin Pro 

304 660 665 
306 (2) INFORMATION FOR SEQ ID NO: 3: 

308 (i) SEQUENCE CHARACTERISTICS: 

309 (A) LENGTH: 1714 base pairs 

310 (B) TYPE: nucleic acid 

311 (C) STRANDEDNESS: double 

312 (D) TOPOLOGY: linear 

316 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

318 AAATTACAAT ACGGACTATG AATTGACCTC TGGAGAAAAA TTACCTCTTC CTAAAGAGAT 60 
320 TTCAGGTTAC ACTTATATTG GATATATCAA AGAGGGAAAA ACGACTTCTG AGTCTGAAGT 120 
322 AAGTAATCAA AAGAGTTCAG TTGCCACTCC TACAAAACAA CAAAAGGTGG ATTATAATGT 180 
324 TACACCGAAT TTTGTAGACC ATCCATCAAC AGTACAAGCT ATTCAGGAAC AAACACCTGT 240 
326 TTCTTCAACT AAGCCGACAG AAGTTCAAGT AGTTGAAAAA CCTTTCTCTA CTGAATTAAT 300 
328 CAATCCAAGA AAAGAAGAGA AACAATCTTC AGATTCTCAA GAACAATTAG CCGAACATAA 360 
330 GAATCTAGAA ACGAAGAAAG AGGAGAAGAT TTCTCCAAAA GAAAAGACTG GGGTAAATAC 420 
332 ATTAAATCCA CAGGATGAAG ' TTTTATCAGG TCAATTGAAC AAACCTGAAC TCTTATATCG 480 
334 TGAGGAAACT ATGGAGACAA AAATAGATTT TCAAGAAGAA ATTCAAGAAA ATCCTGATTT 54 0 

336 AGCTGAAGGA ACTGTAAGAG TAAAACAAGA AGGTAAATTA GGTAAGAAAG TTGAAATCGT 600 
338 CAGAATATTC TCTGTAAACA AGGAAGAAGT TTCGCGAGAA ATTGTTTCAA CTTCAACGAC 660 
34 0 TGCGCCTAGT CCAAGAATAG TCGAAAAAGG TACTAAAAAA ACTCAAGTTA TAAAGGAACA 720 
34 2 ACCTGAGACT GGTGTAGAAC ATAAGGACGT ACAGTCTGGA GCTATTGTTG AACCCGCAAT 780 
34 4 TCAGCCTGAG TTGCCCGAAG CTGTAGTAAG TGACAAAGGC GAACCAGAAG TTCAACCTAC 84 0 

34 6 ATTACCCGAA GCAGTTGTGA CCGACAAAGG TGAGACTGAG GTTCAACCAG AGTCGCCAGA 900 

34 8 TACTGTGGTA AGTGATAAAG GTGAACCAGA GCAGGTAGCA CCGCTTCCAG AATATAAGGG 960 
350 TAATATTGAG C AAG T AAAAC CTGAAACTCC GGTTGAGAAG ACCAAAGAAC AAGGTCCAGA 1020 
352 AAAAACTGAA GAAGTTCCAG T AAAAC CAAC AGAAGAAACA CCAGTAAATC CAAATGAAGG 1080 
354 TACTACAGAA GGAACCTCAA TTCAAGAAGC AGAAAATCCA GTTCAACCTG CAGAAGAATC 114 0 

35 6 AACAACGAAT TCAGAGAAAG TATCACCAGA TACATCTAGC AAAAATACTG GGGAAGTGTC 1200 
358 CAGTAATCCT AGTGATTCGA CAACCTCAGT TGGAGAATCA AATAAACCAG AACATAATGA 12 60 
360 CTCTAAAAAT GAAAATTCAG AAAAAACTGT AGAAGAAGTT CCAGTAAATC CAAATGAAGG 1320 
362 CACAGTAGAA GGTACCTCAA ATCAAGAAAC AGAAAAACCA GTTCAACCTG CAGAAGAAAC 1380 
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364 ACAAACAAAC TCTGGGAAAA TAGCTAACGA AAATACTGGA GAAGTATCCA ATAAACCTAG 14 4 0 

366 TGATTCAAAA CCACCAGTTG AAGAATCAAA TCAACCAGAA AAAAACGGAA CTGCAACAAA 1500 

368 ACCAGAAAAT TCAGGTAATA CAACATCAGA GAATGGACAA ACAGAACCAG AACCATCAAA 1560 

370 CGGAAATTCA ACT GAG GAT G TTTCAACCGA ATCAAACACA TCCAATTCAA ATGGAAACGA 1620 

372 AGAAATTAAA CAAGAAAATG AACTAGACCC TGATAAAAAG GTAGAAGAAC CAGAGAAAAC 1680 

374 ACTTGAATTA AGAAATGTTT CCGACCTAGA GTTA 171-4 

376 (2) INFORMATION FOR SEQ ID NO: 4: 
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215 










220 










433 


Arg 


He 


Val 


Glu 


Lys 


Gly 


Thr 


Lys 


Lys 


Thr 


Gin 


Val 


lie 


Lys 


Glu 


Gin 


434 


225 










230 










235 










240 


436 


Pro 


Glu 


Thr 


Gly 


Val 


Glu 


His 


Lys 


Asp 


Val 


Gin 


Ser 


Gly Ala 


He 


Val 


437 










245 










250 










255 




439 


Glu 


Pro 


Ala 


He 


Gin 


Pro 


Glu 


Leu 


Pro 


Glu 


Ala 


Val 


Val 


Ser 


Asp 


Lys 


440 








260 










265 










270 






442 


Gly 


Glu 


Pro 


Glu 


Val 


Gin 


Pro 


Thr 


Leu 


Pro 


Glu 


Ala 


Val 


Val 


Thr 


Asp 
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VERIFICATION SUMMARY 

PATENT APPLICATION : US/09/7 65,272 

Input Se£ :. N:\Crf3\RUIJS60\09765272.txt 
Output Set: N:\CRF3\07302001\l765272.raw 



DATE: 07/30/2001 
TIME: 12: 02": 10 



L: 
L- 
L: 
L: 
L: 
M: 
L: 
M: 
L: 
L: 

m 

L: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
M: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
M: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 
L: 



47 .M: 220 G: Keyword misspelled or invalid format, 
4=9 M: 220 C: Keyword misspelled or invalid format, 
73 M:220 C: Keyword misspelled or invalid format, 
853 M:341 W: (46) "n" or "Xaa" used, for SEQ ID#: 
2649 M:lll C: (47) String data converted to upper 
111 Repeated in SeqNo=41 

2885 M:lll C: (47) String data converted to upper 
111 Repeated in SeqNo=45 

2927 M:341 W: (46) "n" or "Xaa" used, for SEQ ID# 
2984 M:lll C: (47) String^ data converted to upper 
111 Repeated in SeqNo=47 



3320 M:341 W: (4 6) "n" or 

3641 M:341 W: (46) "n" or 

3825 M:341 W: (46) "n" or 

3828 M:341 W: (46) "n" or 

4317 M:341 W: (46) "n" or 

4395 M:341 W: (46) "n" or 

4966 M:341 W: (46) "n" or 

5102 M:lll C: (47) 
111 Repeated in SeqNo=75 



"Xaa" 
"Xaa" 
"Xaa" 
"Xaa" 
"Xaa" 
"Xaa" 
"Xaa" 



used, 
used, 
used, 
used, 
used, 
used, 
used, 



for 
for 
for 
for 
for 
for 
for 



SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 
SEQ 



ID#: 
ID#: 
ID#: 
ID#: 
ID#: 
ID#: 
ID#: 



String data converted to upper 



[(A) APPLICATION NUMBER: 
[(B) FILING DATE: ] 
[ (ix) TELECOMMUNICATION 
10 
case, 

case, 

:46 
case, 

52 
56 
58 
58 
66 
66 
74 

case, 



INFORMATION: ] 



5172 M:341 W 
5365 M:341 W 
5398 M:341 W 
5533 M:341 W 
7004 M:341 W 
7007 M:341 W 
7019 M:341 W 
7034 M:341 W 
7075 M:lll C 
111 Repeated 
7616 M:lll C 
7764 M:341 W 
7944 M:lll C 
8040 M:341 W 
10220 M:341 
10343 M:341 



10809 
10812 
10815 
11039 
11932 
14181 
17498 



:341 
:341 
:341 
:341 



M:341 
M: 111 
M:336 



(46) "n" or "Xaa" used, for SEQ ID#:76 
(46) "n" or "Xaa" used, for SEQ ID#:80 
(46) "n" or "Xaa" used, for SEQ ID#:80 
(46) "n" or "Xaa" used, for SEQ ID#:82 
(46) "n" or "Xaa" used, for SEQ ID#:106 
(46) "n" or "Xaa" used, for SEQ ID#:106 
(4 6) "n" or "Xaa" used, for SEQ ID#:106 

(46) "n" or "Xaa" used, for SEQ ID#:106 

(47) String data converted to upper case, 
in SeqNo=107 

(47) String data converted to upper case, 

(46) "n" or "Xaa" used, for SEQ ID#:118 

(47) String data converted to upper case, 
(46) "n" or "Xaa" used, for SEQ ID#:120 

"Xaa" used, for SEQ ID#:160 
used, for SEQ ID#:162 
used, for SEQ ID#:172 
used, for SEQ ID#:172 
used, for SEQ ID#:172 
used, for SEQ ID#:176 
used, for SEQ ID#:194 



(46) 
(46) 
(46) 
(46) 
(46) 
(46) 
(46) 
(47) 



•n' 
'n' 
'n' 
'n' 
'n r 
'n r 
'n' 



or 
or 
or 
or 
or 
or 
or 



String 



"Xaa" 
"Xaa" 
"Xaa" 
"Xaa" 
"Xaa" 
"Xaa" 
data 



converted to upper case, 



Invalid Amino Acid Number in Coding Region, SEQ ID: 452 
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